May 14th 2020

Project requirements

Follow the IMRAD standard scientific structure: - Introduction - Materials and Methods - Results (And) - Discussion With a technical focus, but minding to communicate which-ever biological insights you arrived at

Should not include all your code (we will look into that at the individual examinations), but rather focus on the broader picture of what you did and include data summaries and visualisations

Created using ioslides_presentation rmarkdown (i.e. the right-most doc column in the project organisation will be a rmarkdown based presentation)

Project outline:

  • Loading and cleaning data
    • Merge datsets
    • Map locations to country
  • Augmentation of data
    • Group snake subspecies and venom types
  • Initial Analysis and visualisations
    • Venom composition
    • Geographical distribution
  • Unsupervised analysis
    • PCA
    • K-means clustering
  • Supervised classification model
    • Artificial Neural Network (ANN)

Introduction

  • Intro to snake venom
  • Data set for the study:
    • Venom compositions from snakes all around the world
  • Goal of study:
    • Group snakes by genus based on venom composition (PCA, K-means, ANN)

The datasets

  • Main data
## # A tibble: 242 x 100
##   Snake Reference Note  `SVMP (Snake Ve… `PI-SVMP (Snake… `PII-SVMP (Snak…
##   <chr> <chr>     <chr>            <dbl>            <dbl>            <dbl>
## 1 Agki… https://… Mexi…             24.5                0                0
## 2 Agki… https://… Cost…             30.8                0                0
## 3 Agki… https://… Mexi…             30.6                0                0
## 4 Agki… https://… Orig…             32.5                0                0
## # … with 238 more rows, and 94 more variables: `PIII-SVMP (Snake Venom
## #   Metalloproteinase PIII), %` <dbl>, …
  • New data
## # A tibble: 27 x 4
##   Toxin               `Vipera aspis asp… `Vipera berus ber… `Vipera anatolica s…
##   <chr>                            <dbl>              <dbl>                <dbl>
## 1 SVMP (Snake Venom …               13.4                 NA                 42.9
## 2 3Ftx (three-finger…               NA                   NA                 NA  
## 3 Unknown peptides                  NA                   NA                 23.5
## 4 PLA2 (Phospholipas…               30.9                 NA                  8.2
## # … with 23 more rows

Materials and methods

Augmented data

  • Join new data
  • Group toxins
  • Remove toxins with few occurances
  • Map genus to snake family
## # A tibble: 233 x 43
##   Snake Genus Species Reference Country SVMPi `DC-fragment` CRISP `3Ftx`   PLB
##   <chr> <chr> <chr>   <chr>     <chr>   <dbl>         <dbl> <dbl>  <dbl> <dbl>
## 1 Agki… Agki… biline… https://… Mexico      0           0     0        0     0
## 2 Agki… Agki… biline… https://… Costa …     0           0     0        0     0
## 3 Agki… Agki… biline… https://… Mexico      0           0     5.6      0     0
## 4 Agki… Agki… contor… https://… Unknown     0           0.1   3.7      0     0
## # … with 229 more rows, and 33 more variables: `Cro-toxin` <dbl>, …

Geographical overview of samples

Snakes from richer countries or countries with a focus on snake research is overrepresented.

Snake family count

Snake family

Snake family

Most abundant toxins

Compare two snakes

Intra species comparison

Shiny app

Results from PCA and K-means

Prediction models based on venom composition

  • A smiple vanilla ANN managed to correctly classify the whole testset (25 % of data)
    • Specifications: 4 hidden neurons, learning rate = 0.001, n_epocs = 100, loss criterion = Binary Crossentropy

Training of ANN predicting snake family

Analysis of incorrect labels (1)

  • If test size is increased to 40 %, the model misclassifies 5 snakes as illustrated below:

Analysis of incorrect labels (2)

Analysis of the venom composition of the incorrectly labeled snakes:

## # A tibble: 5 x 2
##   Snake                    Family   
##   <chr>                    <chr>    
## 1 Daboia russelii russelii Viperidae
## 2 Hydrophis cyanocinctus   Elapidae 
## 3 Micropechis ikaheka      Elapidae 
## 4 Naja kaouthia            Elapidae 
## 5 Naja kaouthia            Elapidae

Shiny app

Static plots

Compare two snakes

Intra species comparison